Predicting Scientific Grid Data Transfer Characteristics

نویسندگان

  • William Agnew
  • Michael Fischer
  • Kyle Chard
  • Ian Foster
چکیده

Big data scientists routinely transfer massive amounts of data. By understanding and modelling different aspects of these data transfers, we can make using big data more efficient and user-friendly. In this paper, we first develop a set of data storage location prediction heuristics. These heuristics help big data scientists manage and discover locations to transfer their data from and to. We show, via analysis of historical Globus operations, that our approaches can predict the storage locations accessed by users with 78.2% and 95.5% accuracy for top-1 and top-3 recommendations, respectively. Predicting transfer bandwidth allows for more optimal selections of data replicas to download from and for more optimal scheduling and routing of data transfers. We show that existing bandwidth prediction techniques perform poorly on real-world data and develop heuristics that (performance statistics).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predicting the Performance of Data Transfer in a Grid Environment

In a Grid environment, implementing a parallel algorithm for data transfer or multiple parallel jobs allocation doesn’t give reliable data transfer. There is a need to predict the data transfer performance before allocating the parallel processes on grid nodes. In this paper we propose a predictive framework for performing efficient data transfer. Our framework considers different phases for pr...

متن کامل

Sampling-Based Tasks Scheduling in Dynamic Grid Environment

-In this paper, we propose a new solution for data mining task scheduling in Grid environment. First, we propose a sample-based application run time evaluation. Then, we propose a cost model for predicting the data transfer time on Grid. Finally, according the priori estimation of the application response time and the data transfer time, we propose the method for tasks scheduling in grid enviro...

متن کامل

Predicting Sporadic Grid Data Transfers

The increasingly common practice of (1) replicating datasets and (2) using resources as distributed data stores in Grid environments has lead to the problem of determining which replica can be accessed most efficiently. Due to diverse performance characteristics and load variations of several components in the end-to-end path linking these various locations, selecting a replica location from am...

متن کامل

OGSA First Impressions - A Case Study Re-engineering a Scientific Application with the Open Grid Services Architecture∗

We present a case study of our experience re-engineering a scientific application using the Open Grid Services Architecture (OGSA), a new specification for developing Grid applications using web service technologies such as WSDL and SOAP. During the last decade, UCL’s Chemistry department has developed a computational approach for predicting the crystal structures of small molecules. However, e...

متن کامل

Reliable File Transfer in Grid Environments

Grid-based computing environments are becoming increasingly popular for scientific computing. One of the key issues for scientific computing is the efficient transfer of large amounts of data across the Grid. In this poster we present a Reliable File Transfer (RFT) service that significantly improves the efficiency of large-scale file transfer. RFT can detect a variety of failures and restart t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016